Potential protective effect against SARS-CoV-2 infection by APOE rs7412 polymorphism

The pandemic burden caused by the SARS-CoV-2 coronavirus constitutes a global public health emergency. Increasing understanding about predisposing factors to infection and severity is now a priority. Genetic, metabolic, and environmental factors can play a crucial role in the course and clinical outcome of COVID-19. We aimed to investigate the putative relationship between genetic factors associated to obesity, metabolism and lifestyle, and the presence and severity of SARS-CoV-2 infection. A total of 249 volunteers (178 women and 71 men, with mean and ± SD age of 49 ± 11 years) characterized for dietary, lifestyle habits and anthropometry, were studied for presence and severity of COVID-19 infection, and genotyped for 26 genetic variants related to obesity, lipid profile, inflammation, and biorhythm patterns. A statistically significant association was found concerning a protective effect of APOE rs7412 against SARS-CoV-2 infection (p = 0.039; OR 0.216; CI 0.084, 0.557) after correction for multiple comparisons. This protective effect was also ascribed to the APOɛ2 allele (p = 0.001; OR 0.207; CI 0.0796, 0.538). The genetic variant rs7412 resulting in ApoE2, genetic determinant of lipid and lipoprotein levels, could play a significant role protecting against SARS-CoV-2 infection.


Results
Descriptive analysis of COVID-19 disease in the GENYAL platform. A study to identify single nucleotide genetic polymorphisms involved in the SARS-CoV-2 infection was conducted in the GENYAL population. A total of 371 volunteers from this cohort answered the COVID-19 questionnaire designed for this study. COVID-19 questionnaire plus genotyping data were finally restricted to a total of 249 subjects with enough reliable data (178 women and 71 men, with mean and ± SD age of 49 ± 11 years) presented in Table 1. Of the total sample, 210 subjects were considered young adults (< 60 years old, mean age of 46 ± 10 years) while 39 subjects were senior adults (≥ 60 years old; 65 ± 3 years). Table 1 shows the descriptive data of the study sample categorized by gender, age, and infection status ("infected": 50; "not infected": 141 subjects; and volunteers with suspected SARS-CoV-2 infection without diagnosis, labeled "not clear": 58). Considering the potential impact of comorbidities on SARS-CoV-2 susceptibility and outcome of SARS-CoV-2 infection, possible associations between comorbidities and SARS-CoV-2 infection status were tested and the results of the analysis are presented in Table 1. No associations were found between SARS-CoV-2 infection status and the studied comorbidities, with the exception of immunosuppression. The data according to the severity of the symptoms (for those infected) are shown in Table S1. No significant associations were found between the studied SNPs involved in lipid metabolism, obesity, inflammation and biorhythms, and COVID-19 severity. Likewise, no positive association was found between SARS-CoV-2 infection and the lifestyle parameters analyzed.
Associations with genetic variants related to metabolism, obesity, inflammation, and chronobiology on SARS-CoV-2 infection. The objective of this study was to identify putative associations between SARS-CoV-2 infection and genetic factors involved in metabolic or inflammation processes. In this way, 26 SNPs corresponding to genes related to lipid metabolism, obesity, inflammation, and biorhythms were tested for their association with SARS-CoV-2 infection and severity. The frequencies obtained were similar to the European frequencies and the Hardy Weinberg equilibrium was calculated as shown in Table S2. Ordinal logistic regression models were derived for each SNP, adjusted by sex, age, risk factors, risk of virus exposition, hypertension, symptoms severity, and smoking status. An additive model was adopted for the SNPs, where the effect of a homozygotic minor allele genotype was twice the effect of the heterozygotic allele genotype. As predicted endpoint, the three-levels infection status were used ("not infected" vs "not clear" vs "infected"). Figure 1 displays the odds-ratios and p values (before and after Bonferroni correction for multiple tests) obtained for each of the SNPs in these models.

APOE rs7412 and APOE2 involvement in SARS-CoV-2 infection. A significant association between
the genetic variant APOE rs7412 and SARS-CoV-2 infection status was found after adjusting for multiple testing corrections. For the additive model, a higher protective effect for each T allele was found with respect to SARS-CoV-2 infection (OR: 0.22; CI: 0.084, 0.557; p = 0.039), which is present in the minor ɛ2 allele of APOE. Also, for the dominant model (CC vs CT + TT) the higher protective effect for the T allele (OR: 0.213; CI: 0.081, 0.561; p = 0.046) was found. These results are presented in Table 2. Bootstrap statistics are displayed in Table S3, which shows a small optimism correction and good calibration, signatures of good external predictive power of the model. Figure 2 describes the distribution of rs7412 genotypes vs infection status. rs7412 has a T minor allele and a C major allele; in our sample, the CC, CT, and TT genotypes were present in a total of 218, 30  www.nature.com/scientificreports/ respectively. As observed in Fig. 2, for the additive model, when moving from the "not infected" to the "not clear" to the "infected" group, there is a proportional enrichment of the CC genotype in the genetic composition of the corresponding individuals, while the TT genotype is only present in the "not infected" group; the heterozygote also shows a decrease in its contribution upon higher levels of infection, but in a lower proportion.   www.nature.com/scientificreports/ APOE gene has three main alleles, ɛ2, ɛ3 and ɛ4, coding for three corresponding isoforms of apolipoprotein E: E2, E3 and e4, respectively. Table S4 displays the distribution of genotypes in our samples. The main isoform E3 and the minor isoform E4 of ApoE protein have strong affinity to Low Density Lipoprotein Receptor (LDLR) and have been associated to a higher risk of Alzheimer's disease. On the contrary, E2 isoform has been described as neuroprotective, and presents low affinity for the LDL receptor. In terms of sequences, these isoforms are defined by the amino acids occurring at positions 112 and 158, and determined by rs429358 together with rs7412: E2 isoform: Cys112 (rs429358-T), Cys158 (rs7412-T); E3 isoform: Cys112 (rs429358-T), Arg158 (rs7412-C) and E4 isoform has Arg112 (rs429358-C), Arg158 (rs7412-C). Thus, to determine a putative association between APOE isoform and SARS-CoV-2infection, we next evaluated the combination of the two SNPs rs7412 and rs429358 in the risk of infection by mean of an ordinal logistic model with the same covariables as before, by using a model that included two variables representing the number of ε2 and ε4 alleles, respectively, in the genotype. Table 3 describes the distribution of the three alleles vs infection classes, together with the statistics of the resulting model where a protective effect was found only for ɛ2 allele (OR: 0.207; CI: 0.0796, 0.538; p = 0.001). No significant association was found for ɛ4 (OR: 0.784; 0.402, 1.53; p = 0.476). On the basis of these results, ɛ2 allele seems to be a favorable marker against SARS-CoV-2 infection.

T allele of CLOCK rs3749474 might be involved in a potential increased risk of SARS-CoV-2 infection.
In addition to the association observed with APOE genetic variant, the analysis of the genotypes performed showed a positive association between the risk of SARS-CoV-2 infection and CLOCK gene rs3749474 polymorphism (p = 0.024), although statistical significance was lost after Bonferroni correction (Fig. 1). This genetic variant has been associated with modulation of metabolic traits, such as energy intake 16 , obesity 17 , and biorhythms 18,19 , and it has also been related to variations in the levels of several cytokines 16 . The best fit was obtained with the additive model, in which each copy of the minor allele T was associated with an increased risk of COVID-19 disease (OR = 1.59, CI: 1.06-2.38) ( Table 4). The genotype frequencies obtained from this polymorphism were similar to those described for the European population when analyzing the study population as a whole. However, when stratifying for positive infection, it was observed that among the participants infected by SARS-CoV-2, the TT genotype was present in 22% of the participants, meanwhile only 7.8% of this genotype was present in the group of uninfected subjects (Fig. 3).
In view of these results, an interaction model between APOE rs7412 and CLOCK rs3749474 variants was performed. We noted that the combination of the two polymorphisms together significantly enhanced the predictive effect that APOE variant on the risk of SARS-CoV-2 infection compared to the individual analysis (p = 0.02), suggesting a cooperative role of both genetic variants in relation to the infectivity of the virus.
This remarkable finding could be an initial indicator of the potential predictive value of polymorphic variants in the CLOCK gene on SARS-CoV2 infection, as well as the correlation that the two SNPs, APOE and CLOCK, have together regarding susceptibility to COVID-19 infection.
Finally, we investigated the possible modulating effects of diet or medication on SARS-CoV-2 infection. Healthy eating index (IAS), energy intake, total lipids and carbohydrates, simple sugars and saturated fatty acids consumption were tested, as well as consumption of anti-inflammatory, hypotensive or lipid lowering drugs for medication. No significant correlation with SARS-CoV-2 infection or symptoms severity was found in any case.

Discussion
The present work has focused on the study of genetic factors involved in SARS-CoV-2 infection and its severity in order to broaden the knowledge of the genetic influence on the probability of infection. To this end, single nucleotide genetic variants in genes related to lipid metabolism, inflammation and biorhythms have been analyzed.
A statistically significant association between the genetic variant APOE rs7412 and the SARS-CoV-2 infection status was detected for the additive and dominant models, that remained significant after Bonferroni multiple-test correction. Both models suggest that the minor T allele (dominant model) or each copy of the T allele (additive www.nature.com/scientificreports/ model), would elicits a protective effect against SARS-CoV-2 infection. Nonetheless, given the cross-sectional nature of this study, it is not possible to fully demonstrate from here a causal involvement of the ε2 allele in the observed decrease in infection. However, the significant association reported in this work opens the door to future controlled longitudinal trials capable of confirming this hypothesis, of great interest given the need to understand the risk factors of COVID-19 disease. This protective effect may be related to multiple processes where apolipoprotein E is implicated including cardiovascular function, immune system and inflammation 20,21 . Previous studies have evidenced the association of APOE alleles with susceptibility to infection by viruses, with elevated levels of lipoprotein(a) (Lp(a)), with local extracellular vesicle biogenesis in the brain, with the levels of the Willebrand factor and the modulation of angiotensin converting enzyme 2 (ACE2) 20,[22][23][24][25][26][27] . These processes may be related to a prothrombotic condition which, together with the inflammatory character of SARS-CoV-2 infection, may lead to a higher infection capacity and severity. In addition, different isoforms of apolipoprotein E can give rise to multiple phenotypes, with numerous factors intervening in their biogenesis and expression regulation. For this reason, the multifaceted associations to APOE gene require multiple approaches to be taken into account 20 .
The knowledge provided in the current literature regarding COVID-19, refers mainly to APOE alleles (ɛ2, ɛ3, and ɛ4) in the population due to the multiallelic nature of apolipoprotein E gene. Nonetheless, the results of the present study revealed a protective effect of the minor T allele of the SNP rs7412 which is found only in the minor ɛ2 allele of APOE. Consequently, these results are complementary to those observed in other studies, where the ɛ4 allele increases the risk of SARS-CoV-2 infection and severity of COVID-19 the symptoms [27][28][29][30] .
In addition, genome wide significant associations studies (GWAS) related to rs7412 (minor T allele) and higher odds of surviving and longevity have been found 31,32 . On one hand, studies have shown that the rs7412 genetic variant had the strongest association with LDL-TG and RLP-C as well as a significant association with increased TG and HDL-C levels, and with decreased LDL-TG, LDL-C, total cholesterol, non-HDL-C, and Lp(a) levels 33 . In addition, several studies have reported a significant association between the APOE2-determining allele of rs7412 and Lp(a) concentrations 34,35 . These investigations similarly revealed a decrease in Lp(a) in the population carrying the allele associated with rs7412 34,35 . In the same time, elevated levels of this lipoprotein have been proposed to be associated with the risk of SARS-CoV-2 infection and with a very high risk of thrombosis 23 . Therefore, these mechanisms could help to explain the potential influence of this genetic variant on SARS-CoV-2  www.nature.com/scientificreports/ infection. For this reason, this study may help to provide new avenues for studying ApoE purposes, evaluating not only the effect exerted by ɛ4 but also the protective effect of rs7421 and ɛ2 allele against COVID-19. Subsequently, the effect of the APOE gene alleles and the risk of SARS-CoV-2 infection was assessed. As alleles ɛ2, ɛ3 and ɛ4 in APOE gene correspond to genotypes rs7412-T/rs429358-T; rs7412-C/rs429358-T; and rs7412-C/rs429358-C, respectively, we evaluated the combination of the two SNPs rs7412 and rs429358 in the risk of infection. In the resulting model, a protective effect was found only for ɛ2 allele after adjusting for the Bonferroni correction. These results, consequently, are in accordance with our hypothesis on the protective effect exerted by the ɛ2 allele. In the same time, published literature reports the beneficial character of apolipoprotein E2 according to cardiovascular diseases, neuroprotection, and other age-related disease traits and survival implicated in the promotion of longevity 27,36,37 . This way, a lower affinity for the LDL receptor, reduced hypertension and/or increased neuroprotective capacity are proposed mechanisms which may contribute to the protective mechanism against SARS-CoV-2 infection 38,39 . Therefore, the multifaceted nature of APOE pleomorphic gene, specifically the ɛ2 allele, may have a protective effect against COVID-19 disease.
Besides that, as COVID-19 pandemic evolves, several discouraging long-term effects are being observed. These include neurological manifestations such as smell/taste disorders and non-specific and possibly systemic neuropathological symptoms 40 . This seems to be due to the neurotropic character of the virus, where potential entry of the virus into the vascular endothelial cells of the brain is possible 40,41 . In this scenario, a possible neuroprotective effect could be also hypothesized by the rs7412 or the ɛ2 allele, given its beneficial character in relation to neurodegenerative mechanisms 41 .
Diet and medication do not seem to interfere with infection status and rs7412. However, more studies should be conducted in this direction. This study has been performed in a mainly non-pathological population and a larger sample size may be necessary for future analysis.
The immuno-inflammatory activation as a consequence of a viral infection is a complex system which is affected by multiple factors, including the host circadian clock 14,42 . Disruptions in circadian rhythms have been reported to affect both viral replication and dissemination within the host, as well as innate and adaptive immune responses, which can modulate the infective capacity of the virus and its clinical manifestations 13,14,42,43 . Regarding the infection by SARS-CoV-2, in addition to the previously mentioned, it has been reported that the circadian clock machinery is closely involved in the transcriptional expression of ACE2, the main responsible receptor for the viral entry of SARS-CoV-2 into the cells 44 . In view of the previous evidence, several authors accentuated the urgent need to study the potential contribution of circadian rhythms on COVID-19 disease to clarify whether this relationship can play a role in the spread of this pandemic [45][46][47] . In this sense, our study shows a positive association between the minor T allele of the rs3749474 variant of the CLOCK gene (a core component of the circadian clock) and the risk of infection by SARS-CoV-2 (p = 0.024), although this significance was lost after adjusting for Bonferroni correction, probably due to the low sample size. Despite the fact that the genotype frequencies of this polymorphism were similar to those described in the European population, infected subjects presented an increased incidence of TT + CT genotypes compared to non-infected subjects, pointing to a higher infection rate in the T allele carriers of this polymorphism. This experimental finding describes for the first time a potential direct relationship between a gene involved in the circadian clock and the modulation of SARS-CoV-2 infection, thus opening the door to future studies in this field.
The rs3749474 polymorphism of the CLOCK gene has also been associated with altered metabolic features. Subjects who carry the T allele of this variant show a higher energy intake 16 BMI and risk of obesity 17,48 . After numerous clinical reports, it is a statement that obese patients, and especially those with a higher BMI, are severely affected by COVID-19 and present a worse outcome than patients with lower BMI [49][50][51] . Again, this supports the link described in our study between the rs3749474 polymorphism and SARS-CoV-2 infection.
It is noteworthy that of all the polymorphisms studied in our population, APOE and CLOCK polymorphisms have shown significant associations with SARS-CoV-2 infection, and their statistical significance is also enhanced when they are analyzed together in the same model. This suggests a cooperative role of both genes related to the infectivity of this virus. Other authors have previously described cases where the CLOCK gene and APOE genotypes can interact together modulating the progression of inflammatory-based diseases 52,53 . This reinforces the cooperative role proposed for these polymorphisms on the susceptibility to infection by SARS-CoV-2, as well as the progression and severity of other inflammatory-based diseases in which these genes may play a major role.
The relationship between the severity of the symptoms in the group of individuals infected by SARS-CoV-2 and the analyzed genetic polymorphisms was also assessed. However, no significance was noticed for any of the SNPs tested. These results could suggest that APOE and CLOCK could be involved in modulating the risk of SARS-CoV-2 infection as reported, but they may not have a major influence on the progress of the disease once established. However, our study has some limitations. In the first place, the sample size of the infected population might not be large enough to appreciate a significant influence of APOE and CLOCK in the severity of symptoms. Therefore, it would be advisable to test this association in a large number of infected subjects to ensure if these factors could affect the severity of symptoms caused by COVID-19 disease. Eventually, because of the complicated epidemiological situation Spain was suffering while the study was being performed, the limited availability of diagnostic tests, and the retrospective nature of the study, it was not possible to verify firmly the participants' SARS-CoV-2 infection. Nonetheless, a validation and verification mechanism of past infection would have been interesting and will be considered in new investigations. Future studies with larger sample size are warranted to contrast the results obtained in this research. Additionally, analyzing these associations in an ethnically diverse population could be relevant as ethnicity has an impact on these variants.
To conclude, although more studies need to be conducted to corroborate the current results, we observed than the minority allele (T) of the genetic variant APOE rs7412 and the APOɛ2 allele, may confer a certain protection against SARS-CoV-2 infection. This observation will allow the performance of fully longitudinal studies www.nature.com/scientificreports/ to confirm this novel hypothesis from a causative point of view. In addition, the data here obtained and publicly shared will allow the estimation of appropriate sample sizes and power for these future trials.

Materials and methods
Subjects and study protocol. Subjects that participated in this study were recruited through the Platform for Clinical Trials in Nutrition and Health (GENYAL), at IMDEA Food Institute (Madrid, Spain). Given the novelty of the COVID pandemic, no information about the variability of the data to collect was available, precluding the use of sample size calculations, so we used the largest sample available. Volunteers included in the GENYAL database were contacted to participate in this study. A total of 371 volunteers of European ancestry were recruited with the objective to link data related to genetic variants, anthropometric measurements, dietary patterns and lifestyle with SARS-CoV-2 infection and severity of COVID-19 disease. Inclusion criteria were the following: previous participation in GENYAL Platform studies, adequate level to understand the research propose, agreement to voluntarily participate in the study and provide informed consent. Only those who refused to give their consent to participate in this study were excluded. CONSORT flow diagram is shown in Fig. S1. Although it could have been better to start with a larger sample, we decided to carry on with the sample available at our hands, on the interest on finding novel associations and risk factors for this novel COVID19 pandemic, and using our resources and experience in the nutrigenomics area. This study was performed according to the guidelines present in the Declaration of Helsinki and all procedures involving human subjects were approved by the Research Ethics Committee of the IMDEA Food Foundation (CEI 27-666). Written informed consent to enroll in the study was obtained from all the participants. The trial registration number from ClinicalTrials.gov is NCT04067921.
Anthropometry and lifestyle parameters. Anthropometric measurements such as height, weight and waist circumference were measured following standard validated techniques 54,55 . Body weight was assessed using the body composition monitor BF511 (Omron Healthcare UK Ltd., Kyoto, Japan). Height was assessed using a stadiometer (Leicester-Biological Medical Technology SL, Barcelona). Waist circumferences were measured using a Seca 201 non-elastic tape (Quirumed, Valencia, Spain). The dietary patterns of each participant were assessed using a validated 72-h dietary food record and a food frequency questionnaire 56 . Energy intake, healthy eating index, macro and micronutrients from the dietary records were analyzed using the DIAL software (2.16 version Alce Ingeniería, Madrid, Spain). Regarding lifestyle data, physical activity parameters (Inactive: 0 times of physical activity performance per week/Active: one or more times of physical activity performance per week) and smoking habits (cigarettes/day) were collected from every participants 57 .
DNA extraction and genotyping. Single nucleotide polymorphisms related to lipid metabolism, obesity, circadian rhythm, immune system, and estrogens were previously screened and selected according to previously published procedures 58,59 . Blood samples were obtained from study participants and stored at − 80 °C until DNA extraction. Genomic DNA from each participant was isolated using the QIAamp DNA Blood Mini Kit (Qiagen Sciences, Inc, Germantown, MD, USA). DNA concentration and quality were assured in a nanodrop ND-2000 spectrophotometer (ThermoScientific, Waltham, MA, USA). Gene genotyping was performed using TaqMan OpenArray plates with the QuantStudio™ 12 K Flex Real-Time PCR System (Life Technologies Inc., Carlsbad, CA, USA) according to the manufacturer's instructions 60 . The results obtained were analyzed with the TaqMan Genotyper software 61 . Duplicates of the samples were made, obtaining more than 99% of technically valid genotypes.
COVID-19 data collection. The participants answered a detailed survey of 24 questions related to COVID-19 disease. Questions were stratified into 4 different sections designed to determine: 1-the degree of exposure to SARS-CoV-2, 2-the diagnosis of the disease, 3-symptomatology and severity of the symptoms and 4-associated risk factors. The responses to each of these sections were analyzed and categorized in different clusters according to the following criteria: www.nature.com/scientificreports/ Statistical analysis. The R software, version 3.6.1, was used for all the statistical analyses 62 . A significance level of 0.05 was used in bilateral tests. Missing data was multiple imputed using the mice package, based on conditional chained equations, and using the default imputation methods, namely predictive mean matching for numeric variables, logistic regression for binary variables, polytomous regression for categorical data, and proportional odds logistic regression for ordinal variables 63 . Pooled estimates of parameters and 95% confidence intervals, as well as of p values were obtained afterwards from the multiple imputations 64 . The rms package was used to derive ordinal logistic models in the proportional-odds version. These models were fitted to predict infection status for the whole dataset, and symptoms severity in the infected stratum of the sample. A wide set of 26 SNPs related to metabolism and nutrition was tested for significance in the prediction of these endpoints, and sex, age, risk, exposition to infection, hypertension, symptoms severity and smoking status were used as adjustment variables. Multiple-test correction of p-values was controlled with the Bonferroni method. The model was validated through bootstrap, using 2000 resamples. An additive model was adopted for the SNPs, where the effect of a homozygotic minor allele genotype is twice the effect of the heterozygotic allele genotype. The use of a codominant or a dominant model did not improve the fit in general, so they were not used. The significance of medication and diet variables was tested by including them in the model in the form of an additive term or an interaction with the rs7412 SNP.

Data and code availability
The datasets presented in this article are not readily available because are part of the GENYAL Platform for clinical trials in nutrition and health (https:// www. food. imdea. org/ servi ces/ Platform-Clinical-Trials-Nutritionand-Health) database. This is a database that is currently registered as a collection under the Spanish rules which by policy of the Center will be public afterwards once the data of the entire expected population is gathered.
Requests to access the datasets should be directed to ana.ramirez@imdea.org.